perm filename BOOST[TLK,DBL] blob sn#198233 filedate 1976-01-27 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	A UNIFIED "Boosting" ALGORITHM
C00009 ENDMK
C⊗;
A UNIFIED "Boosting" ALGORITHM

It would be nice to have a single algorithm for computing the worth of
each newly-created concept, and each newly-proposed job.

FOR A NEW JOB

Say that B.P proposes a new job J = "(O C F) because R" where
O is an operator like "fillin" or "check"
C is a concept like "composition" or "sets"
F is a facet like "examples" or "generalizations"
R is a reason like because "it would enable AM to Fillin Examples of Union"

This event occurs in a context, where AM is trying to satisfy the current job
CAND = "(CO CC CF) because CR".

The totality of variables that might come into play in determining the new
numeric value for J are:
	The interestingness numbers associated with O,C,F,CO,CC,CF.
	The priority number of CAND (which in turn depends on CR)
	A local estimate of the worthwhileness of reason R.
	The priority and reasons of a job (O C F) which already is pending.


The formula is really a 3-step process:
(1) Derive the priority of J assuming R is is a brand new job.
(2) Reconcile this with any job (O C J) already	in the job-list.
(3) Normalize the priority (between INTHRESH and 1000).


Each of these steps needs much elaboration:
(1) Derive a number X which would be the priority of J if (O C F) weren't already
	in the job-list. At the oment, this is done locally (by B.P), but
	Bruce Buchanan has suggested making this obey one global law.
	A sample of this unifying formula might be:
	X = [DOTPRODUCT((I(O),I(C),I(F),I(CO),I(CC),I(CF),prio(CAND))]
		       (.1,  .2,  .2,  .05,  .1,   .1,  .25]
		x worth(R)
	where the I(z) means "interest value of C" and ranges from 0 to 1000,
	and prio(CAND) is the priority of the current job CAND (again 0-1000),
	and worth(R) is estimated locally and ranges from 0 to 1.
	It could be that this is not enough, or may even be more than necessary.
	(e.g., I(CO) may be irrelevant, and the whole formula might need to be
	different depending on O or F).

(2) Search the job-list for (O C J). 
	If it is not found, insert J with priority X.
	If it is found, say it has priority Y. 
		If R is one of the reasons already given, then the new
			priority is 1+ Larger(X,Y).
			Perhaps a better compromise would be
 			1/k↑2 + Larger(X,Y), where J was proposed k times so far.
		If R is a new reason, the priority is
			SQRT(X↑2 + Y↑2).
			Perhaps a better estimate is the geometric mean:
			KthROOT(X x Y↑(k-1)) where k-1 reasons exist already.

(3) If the priority is below INTHRESH, discard the job (or save temporarily).
	If the priority is >1000, replace it by 1000.
	Perhaps a better strategy is to scale all jobs' priorities down so that
	the highest one is just at 1000. Keep this global scaling factor around,
	and use it to multiply each new job by (also INthresh and DOthresh by).
	Only increase it when the new job rates a priority > 1000;
	decrease it when the highest job is below, say, 400).



FOR A NEW CONCEPT

Again, it would be nice to have a single formula which computed what the
estimated worth was of each newly-created concept, instead of doing it locally.
The basic variables one would consider are:
	The interest values for the new concept C, 
		and old ones C1,..,Cn from which C was derived.
	The interest values of CO, CC, CF, and priority of CAND (as above).
The formula might be something like this:
Int(C) = DOTPROD((I(CAND),I(CO),I(CC),I(CF),I(C1),...,I(Cn))
		 ( .3,    .1,    .2,   .2,  .2/n,...,  .2/n))
An analogous "pre-existing" situation occurs often, where a new concept is
already found to exist; this is typically nipped in the bud, before an
interest-value is bother being computed. One alternative is to compute
this value, and use it to boost that concept's worth (and then normalize
all worth facets), analogous to the job-value scheme above.
Again, one must worry about the details of this meshing, about the details of
the unifed formula for estimating the worth of the new concept, etc.

Some experiments one must do involve radically altering these global formulae,
to see how critically the performance of AM depends on good initial estimates
of the worth of jobs and concepts. Hopefully, the answer is "some, but not heavily"